GP-Fileprints: File Types Detection Using Genetic Programming

نویسندگان

  • Ahmed Kattan
  • Edgar Galván López
  • Riccardo Poli
  • Michael O'Neill
چکیده

We propose a novel application of Genetic Programming (GP): the identification of file types via the analysis of raw binary streams (i.e., without the use of meta data). GP evolves programs with multiple components. One component analyses statistical features extracted from the raw byte-series to divide the data into blocks. These blocks are then analysed via another component to obtain a signature for each file in a training set. These signatures are then projected onto a two-dimensional Euclidean space via two further (evolved) program components. K-means clustering is applied to group similar signatures. Each cluster is then labelled according to the dominant label for its members. Once a program that achieves good classification is evolved it can be used on unseen data without requiring any further evolution. Experimental results show that GP compares very well with established file classification algorithms (i.e., Neural Networks, Bayes Networks and J48 Decision Trees).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation of Discharge over the Submerged Compound Sharp-Crested Weir using Artificial Neural Networks and Genetic Programming

Truncated sharp crested weirs are used to measure flow rate and control upstream water surface in irrigation canals and laboratory flumes. The main advantages of such weirs are ease of construction and capability of measuring a wide range of flows with sufficient accuracy. Artificial neural networks (ANNs) and genetic programming (GP) have recently been used for estimation of hydraulic data. In...

متن کامل

RELATIONSHIP OF TENSILE STRENGTH OF STEEL FIBER REINFORCED CONCRETE BASED ON GENETIC PROGRAMMING

Estimating mechanical properties of concrete before designing reinforced concrete structures is among the design requirements. Steel fibers have a considerable effect on the mechanical properties of reinforced concrete, particularly its tensile strength. So far, numerous studies have been done to estimate the relationship between tensile strength of steel fiber reinforced concrete (SFRC) and ot...

متن کامل

Application of Genetic Programming to Modeling and Prediction of Activity Coefficient Ratio of Electrolytes in Aqueous Electrolyte Solution Containing Amino Acids

Genetic programming (GP) is one of the computer algorithms in the family of evolutionary-computational methods, which have been shown to provide reliable solutions to complex optimization problems. The genetic programming under discussion in this work relies on tree-like building blocks, and thus supports process modeling with varying structure. In this paper the systems containing amino ac...

متن کامل

Frequency domain analysis of transient flow in pipelines; application of the genetic programming to reduce the linearization errors

The transient flow analyzing by the frequency domain method (FDM) is computationally much faster than the method of characteristic (MOC) in the time domain. FDM needs no discretization in time and space, but requires the linearization of governing equations and boundary conditions. Hence, the FDM is only valid for small perturbations in which the system’s hydraulics is almost linear. In this st...

متن کامل

Estimating scour below inverted siphon structures using stochastic and soft computing approaches

This paper uses nonlinear regression, Artificial Neural Network (ANN) and Genetic Programming (GP) approaches for predicting an important tangible issue i.e. scours dimensions downstream of inverted siphon structures. Dimensional analysis and nonlinear regression-based equations was proposed for estimation of maximum scour depth, location of the scour hole, location and height of the dune downs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010